To be displayed on a big screen.
Coral cover predictions at multiple scales
Trends with indicators
Model validation diagnostics
Model fit
Model residuals diagnostics
Leave-out-data analysis
The leave-out data approach is used to evaluate the influence of specific observations using prediction-performance measures. The full model is fitted on a train dataset composed of a random sample of observations and prediction-performance measures are computed on the leave-out observations.
We found that the predictive performance of the spatio-temporal model is more sensitive to the number of monitoring locations, with accuracy declining when sites or reefs are removed from the dataset (Figure 8). In contrast, model performance is less affected by the number of replicated years.
Five validation tests are developed:
- rm(20% obs): 20% of data were randomly removed without structure.
- rm(20% reef): 20% of reefs were randomly removed.
- rm(20% site): 20% of sites were randomly removed.
- rm(20% transect): 20% of transects were randomly removed.
- rm(3YRS): 3 years of observations were removed within each location (locations with less than 4 years of data were not used).
Four predictive measures are used:
- 95% coverage interval (CvgErr): evaluates how often predictions include true observations, with the goal of capturing the true values 95% of the time. It is estimated as follows:
\[ \text{CvgErr}(z, \ell, u) \;=\; \left| 0.95 \;-\; \frac{1}{n} \sum_{i=1}^{n} \mathbf{1}\!\left( \ell_i < z_i < u_i \right) \right| \]
where \(z = \{z_1, z_2, \dots, z_n\}\) are the coral cover observations, \(\ell\) and \(u\) are the lower and upper bounds of the predictive intervals, \(n\) the total number of predictions, and \(\mathbf{1}(\cdot)\) is the indicator function (1 if the condition is true, 0 otherwise).
- 95% interval score (IS): rewards prediction intervals that include the true observations (accuracy) and penalizes those that are too narrow or too wide (precision). It is computed as follow:
\[ \text{IS}_{95} \;=\; \frac{1}{n} \sum_{i=1}^{n} \Bigg[ (u_i - \ell_i) + \frac{2}{\alpha} (\ell_i - y_i)\,\mathbf{1}(y_i < \ell_i) + \frac{2}{\alpha} (y_i - u_i)\,\mathbf{1}(y_i > u_i) \Bigg] \]
where \(\alpha = 0.05\), \(\ell\) and \(u\) are the lower and upper bounds of the predictive intervals, \(n\) the total number of predictions, and \(y\) are observed coral cover.
- Root-mean-squared prediction error (RMSPE) - how far off model predictions are from true observations without considering for uncertainty.
\[ \text{RMSPE} \;=\; \sqrt{ \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 } \]
where \(y\) and \(\hat{y}\) are the observed and predicted coral cover values, respectively, and \(n\) the total number of observations.
- Continuous Ranked Probability Score (CRPS) - represents the quality of the predictions over the entire predictive probability distribution penalizing predictions that are inaccurate, imprecise or overconfident.
\[ \text{CRPS}(F, y) \;=\; \sigma \left[ z \left( 2 \Phi(z) - 1 \right) \;+\; 2 \,\phi(z) \;-\; \frac{1}{\sqrt{\pi}} \right], \quad z = \frac{y - \mu}{\sigma} \]
where \(y\) is the observed coral cover values, \(\mu\) and \(\sigma\) are the mean and the standard deviation of the predictive normal distribution, \(\phi(.)\) represented the standard normal probability density function and \(\Phi\) the cumulative distribution function.
These predictive measures give a single number with low scores representing better performances.
Basis function exploratory analysis
The aim of this analysis is to explore the influence of the basis function formulation with a focus on the temporal dimension. To do this, we compare four model performance using different number of basis functions in the temporal basis function:
- Full: number and location of temporal basis functions automatically estimated from the FRK function and adopted by ReefCloud.
- 5: five temporal basis functions randomly selected from the FRK auto basis function.
- 3: three temporal basis functions randomly selected from the FRK auto basis function.
- 1: one temporal basis function randomly selected from the FRK auto basis function.
We found that the estimation of regional trends (Figure 10) is more strongly influenced by the number of temporal basis functions, whereas the attribution of coral loss (Figure 11) is less affected.